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Abstract — We process a large corpus of game records of the 
board game of Go and propose a way of extracting summary 
information on played moves. We then apply several basic data- 
mining methods on the summary information to identify the 
most differentiating features within the summary information, 
and discuss their correspondence with traditional Go knowledge. 
We show statistically significant mappings of the features to 
player attributes such as playing strength or informally perceived 
"playing style" (e.g. territoriality or aggressivity), describe ac- 
curate classifiers for these attributes, and propose applications 
including seeding real-work ranks of internet players, aiding in 
Go study and tuning of Go-playing programs, or contribution to 
Go-theoretical discussion on the scope of "playing style". 

Index Terms — Board games, Evaluation, Function approxima- 
tion, Go, Machine learning, Neural networks, User modelling 



I. Introduction 

THE field of Computer Go usually focuses on the problem 
of creating a program to play the game, finding the best 
move from a given board position [ 1 1. We will make use of one 
method developed in the course of such research and apply it 
to the analysis of existing game records with the aim of helping 
humans to play and understand the game better instead. 

Go is a two-player full-information board game played on 
a square grid (usually 19 x 19 lines) with black and white 
stones; the goal of the game is to surround the most territory 
and capture enemy stones. We assume basic familiarity with 
the game. 

Many Go players are eager to play using computers (usually 
over the internet) and review games played by others on 
computers as well. This means that large amounts of game 
records are collected and digitally stored, enabling easy pro- 
cessing of such collections. However, so far only little has 
been done with the available data. We are aware only of uses 
for simple win/loss statistics |2| |3] and "next move" statistics 
on a specific position O 0. 

Additionally, a simple machine learning technique based 
on GNU Go's [6] move evaluation feature has recently been 
presented in 0. The authors used decision trees to predict 
whether a given user belongs into one of three classes based 
on his strength (causal, intermediate or advanced player). This 
method is however limited by the blackbox-use of GNU Go 
engine, making it unsuitable for more detailed analysis of the 
moves. 

P. Baudis is student at the Faculty of Math and Physics, Charles University, 
Prague, CZ, and also does some of his Computer Go research as an employee 
of SUSE Labs Prague, Novell CZ. 

J. Moudfik is student at the Faculty of Math and Physics, Charles University, 
Prague, CZ. 



We present a more in-depth approach — from all played 
moves, we devise a compact evaluation of each player. We then 
explore correlations between evaluations of various players in 
the light of externally given information. This way, we can 
discover similarity between move characteristics of players 
with the same playing strength, or discuss the meaning of the 
"playing style" concept on the assumption that similar playing 
styles should yield similar move characteristics. 

We show that a sample of player's games can be used to 
quite reliably estimate player's strength, game style, or even 
a time when he/she was active. Apart from these practical 
results, the research may prove to be useful for Go theoretists 
by investigating the principles behind the classical "style" 
classification. 

We shall first present details of the extraction and sum- 
marization of information from the game corpus (section HIT) . 
Afterwards, we will explain the statistical methods applied 
(section ITTTb . and then describe our findings on particular 
game collections, regarding the analysis of either strength 
(section IIVI ) or playing styles (section [V}. Finally, we will 
explore possible interpretations and few applications of our 
research (section IVII ) and point out some possible future 
research directions (section fVIIl l. 

II. Data Extraction 

As the input of our analysis, we use large collections of 
game records in SGF format [8] grouped by the primary object 
of analysis (player name when analyzing style of a particular 
player, player rank when looking at the effect of rank on 
data, etc.). We process the games, generating a description for 
each played move - a pattern, being a combination of several 
pattern features described below. 

We compute the occurence counts of all encountered pat- 
terns, eventually composing rt-dimensional pattern vector p 
of counts of the n (we use n — 500) globally most frequent 
patterns (the mapping from patterns to vector elements is 
common for all generated vectors). We can then process and 
compare just the pattern vectors. 

A. Pattern Features 

When deciding how to compose the patterns we use to 
describe moves, we need to consider a specificity tradeoff 
— overly general descriptions carry too few information 
to discern various player attributes; too specific descriptions 
gather too few specimen over the games sample and the vector 
differences are not statistically significant. 
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We have chosen an intuitive and simple approach inspired 
by pattern features used when computing Elo ratings for 
candidate patterns in Computer Go play (9J- Each pattern is 
a combination of several pattern features (name-value pairs) 
matched at the position of the played move. We use these 
features: 

• capture move flag, 
« atari move flag, 

• atari escape flag, 

• contiguity-to-last fia^] — whether the move has been 
played in one of 8 neighbors of the last move, 

• contiguity-to-second-last flag, 

• board edge distance — only up to distance 4, 

> and spatial pattern — configuration of stones around the 
played move. 

The spatial patterns are normalized (using a dictionary) to be 
always black-to-play and maintain translational and rotational 
symmetry. Configurations of radius between 2 and 9 in the 
gridcular metrico are matched. 

Pattern vectors representing these features contain infor- 
mation on played shape as well as a basic representation of 
tactical dynamics — threats to capture stones, replying to last 
move, or ignoring opponent's move elsewhere to return to an 
urgent local situation. The shapes often correspond to opening 
moves (either in empty corners and sides, or as part of joseki 
— commonly played sequences) characteristic for a certain 
strategic aim. In the opening, even a single-line difference in 
the distance from the border can have dramatic impact on 
further local and global development. 

B. Vector Reseating 

The pattern vector elements can have diverse values since 
for each object, we consider a different number of games (and 
thus patterns). Therefore, we normalize the values to range 
[—1, 1], the most frequent pattern having the value of 1 and 
the least occuring one being —1. Thus, we obtain vectors 
describing relative frequency of played patterns independent 
on number of gathered patterns. But there are multiple ways 
to approach the normalization. 

1 ) Linear Normalization: An intuitive solution is to linearly 
re-scale the values using: 

Ui = 

■f-max 

This is the default approach; we have used data processed by 
only this computation unless we note otherwise. As shown 
on fig. 03 most of the spectrum is covered by the few most- 
occuring patterns (describing mostly large-diameter shapes 
from the game opening). This means that most patterns will be 
always represented by only very small values near the lower 
bound. 

1 We do not consider contiguity features in some cases when we are working 
on small game samples and need to reduce pattern diversity. 

2 The gridcular metric d(x,y) = \Sx\ + \Sy\ + max(|<5a;|, \&y\) produces 
a circle-like structure on the Go board square grid 1101. 
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Fig. 1. Log-scaled number of pattern occurences in the GoGoD games 
examined in sec. [V] 

2) Extended Normalization: To alleviate this problem, we 
have also tried to modify the linear normalization by applying 
two steps — pre-processing the raw counts using 

A = logfo + 1) 

and post-processing the re-scaled values by the logistic func- 
tion: 



However, we have found that this method is not universally 
beneficial. In our styles case study (sec.|V), this normalization 
produced PCA decomposition with significant dimensions 
corresponding better to some of the prior knowledge and more 
instructive for manual inspection, but ultimately worsened 
accuracy of our classifiers. From this we conjecture that the 
most frequently occuring patterns are also most important for 
classification of major style aspects. 

C. Implementation 

We have implemented the data extraction by making use of 
the pattern features matching implementation within the Pachi 
Go-playing program ifTTI . which works according to the Elo- 
rating pattern selection scheme f£|. We extract information on 
players by converting the SGF game records to GTP stream 
|12| that feeds Pachi's patternscan engine, producing 
a single patternspec (string representation of the particular 
pattern features combination) per move. Of course, only moves 
played by the appropriate player are collected. 

III. Data Mining 

To assess the properties of gathered pattern vectors and 
their influence on playing styles, we analyze the data using 
several basic data minining techniques. The first two methods 
(analytic) rely purely on single data set and serve to show 
internal structure and correlations within the data set. 

Principal Component Analysis (PCA) [13] finds orthogo- 
nal vector components that represent the largest variance of 
values within the dataset. That is, PCA will produce vectors 
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representing the overall variability within the dataset — the 
first vector representing the "primary axis" of the dataset, the 
next vectors representing the less significant axes; each vector 
has an associated number that determines its impact on the 
overall dataset variance: 1.0 would mean that all points within 
the dataset lie on this vector, value close to zero would mean 
that removing this dimension would have little effect on the 
overall shape of the dataset. 

Reversing the process of the PCA by backprojecting the 
orthogonal vector components into the original pattern space 
can indicate which patterns correlate with each component. 
Additionally, PCA can be used as vector preprocessing for 
methods that are negatively sensitive to pattern vector compo- 
nent correlations. 

On the other hand, Sociomaps [ 14| [ 15 1 [ 16] produce spatial 
representation of the data set elements (e.g. players) based 
on similarity of their data set features. Projecting some other 
information on this map helps illustrate connections within the 
data set. 

Furthermore, we test several classification methods that 
assign an output vector O to each pattern vector P, the output 
vector representing the information we want to infer from the 
game sample — e.g. assessment of the playing style, player's 
strength or even meta-information like the player's era or the 
country of origin. Initially, the methods must be calibrated 
(trained) on some prior knowledge, usually in the form of 
reference pairs of pattern vectors and the associated output 
vectors. The reference set is divided into training and testing 
pairs and the methods can be compared by the mean square 
error (MSE) on testing data set (difference of output vectors 
approximated by the method and their real desired value). 

The most trivial method is approximation by the PCA repre- 
sentation matrix, provided that the PCA dimensions have some 
already well-defined interpretation. This can be true for single- 
dimensional information like the playing strength. Aside of 
that, we test the fc-Nearest Neighbors (k-NN) classifier [17| 
that approximates O by composing the output vectors of k 
reference pattern vectors closest to P. 

Another classifier is a multi-layer feed-forward Artificial 
Neural Network (see e.g. |fl8l ). The neural network can learn 
correlations between input and output vectors and generalize 
the "knowledge" to unknown vectors. The neural network can 
be more flexible in the interpretation of different pattern vector 
elements and discern more complex relations than the fc-NN 
classifier, but may not be as stable and expects larger training 
sample. 

Finally, a commonly used classifier in statistical inference is 
the Naive Bayes Classifier |fl9l . It can infer relative probability 
of membership in various classes based on previous evidence 
(training patterns). 

A. Statistical Methods 

We use couple of general statistical analysis methods to- 
gether with the particular techniques. To find correlations 
within or between extracted data and some prior know- 
ledge (player rank, style vector), we compute the well-known 
Pearson product-moment correlation coefficient (PMCC) |20|, 



measuring the strength of the linear dependency between any 
two dimensions: 

cov(X, Y) 

rx,Y = 

o~x°~y 

To compare classifier performance on the reference data, 
we employ k-fold cross validation: we randomly divide the 
training set into k distinct segments of similar sizes and then 
iteratively use each part as a testing set as the other k — 1 
parts are used as a training set. We then average results over 
all iterations. 

B. Principal Component Analysis 

We use Principal Component Analysis to reduce the di- 
mensions of the pattern vectors while preserving as much 
information as possible, assuming inter-dependencies between 
pattern vector dimensions are linear. Technically, PCA is 
an eigenvalue decomposition of a covariance matrix of cen- 
tered pattern vectors, producing a linear mapping o from n- 
dimensional vector space to a reduced m-dimensional vector 
space. The m eigenvectors of the original vectors' covariance 
matrix with the largest eigenvalues are used as the base of the 
reduced vector space; the eigenvectors form projection matrix 
W. 

For each original pattern vector we obtain its new 
representation fi in the PCA base as shown in the following 
equation: 

Ti = W-pi (1) 
The whole process is described in the Algorithm Q] 

Algorithm 1 PCA - Principal Component Analysis 
Require: m > 0, set of players R with pattern vectors p r 

for r e R do 

Pr <-fir — P 

end for 

for e {l,...,n} x {l,...,n} do 

Cov[i,j] <- 1/\R\ ■ Y^reRPri ' Prj 
end for 

Compute Eigenvalue Decomposition of Cov matrix 
Get to largest eigenvalues 

Most significant eigenvectors ordered by decreasing eigen- 
values form the rows of matrix W 
for r e R do 

f r <— Wp r 
end for 



C. Sociomaps 

Sociomaps are a general mechanism for visualizing rela- 
tionships on a 2D plane such that given ordering of the player 
distances in the dataset is preserved in distances on the plane. 
In our particular case, we will consider a dataset S of small- 
dimensional vectors Sj. First, we estimate the significance of 

3 A desirable property of PMCC is that it is invariant to translations and 
reseating of the vectors. 
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difference of each two subjects. Then, we determine projection 
ip of all the Si to spatial coordinates of an Euclidean plane, 
such that it reflects the estimated difference significances. 

To quantify the differences between the subjects (team 
profiling) into an A matrix, for each two subjects i,j we 
compute the scalar distance! of Sj , Sj and then estimate the 
Aij probability of at least such distance occuring in uniformly- 
distributed input (the higher the probability, the more signifi- 
cant and therefore important to preserve the difference is). 

To visualize the quantified differences, we need to find 
the (p projection such that it maximizes a three-way ordering 
criterion: ordering of any three members within A and on the 
plane (by Euclidean metric) must be the same. 



max V $(ip,i,j,k) 

ID ' * 



®(<p,i,j, k) 



1 5{l,Ai h A ik ) 
otherwise 



1 \a - b\ > \a - c| 
6(a, b, c) = ^ \a — b\ — \a — c\ 
-1 \a — b\ < \a - c\ 



The <p projection is then determined by randomly initializing 
the position of each subject and then employing gradient 
descent methods. 



D. k-Nearest Neighbors Classifier 

Our goal is to approximate the player's output vector 
O, knowing their pattern vector P. We further assume that 
similarities in players' pattern vectors uniformly correlate with 
similarities in players' output vectors. 

We require a set of reference players R with known pattern 
vectors p r and output vectors o r . O is approximated as 
weighted average of output vectors Oi of k players with pattern 
vectors pi closest to P. This is illustrated in the Algorithm 
12 Note that the weight is a function of distance and is 
not explicitly defined in Algorithm [2] During our research, 
exponentially decreasing weight has proven to be sufficient, 
as detailed in each of the case studies. 

Algorithm 2 k-Nearest Neighbors 

Require: pattern vector P, k > 0, set of reference players R 

for all r e R do 

D[r] EuclideanDistance(p r , P) 
end for 

N <— SelectSmallest(k, R, D) 

6 4r- 

for all r e N do 

6 <- 6+ Weight (D[r]) ■ o r 
end for 




E. Neural Network Classifier 

Feed-forward neural networks are known for their ability 
to generalize and find correlations between input patterns and 
output classifications. Before use, the network is iteratively 
trained on the training data until the error on the training set 
is reasonably small. 

1 ) Computation and activation of the NN: Technically, the 
neural network is a network of interconnected computational 
units called neurons. A feed-forward neural network has a 
layered topology. It usually has one input layer, one output 
layer and an arbitrary number of hidden layers between. Each 
neuron i gets input from all neurons in the previous layer, each 
connection having specific weight tu,j. 

The computation proceeds in discrete time steps. In the first 
step, the neurons in the input layer are activated according to 
the input vector. Then, we iteratively compute output of each 
neuron in the next layer until the output layer is reached. The 
activity of output layer is then presented as the result. 

The activation yi of neuron i from the layer / is computed 

as 



(2) 



where J is the previous layer, while yj is the activation for 
neurons from J layer. Function /() is a so-called activation 
function and its purpose is to bound the outputs of neurons. 
A typical example of an activation function is the sigmoid 
function@ 

2) Training: Training of the feed-forward neural network 
usually involves some modification of supervised Backprop- 
agation learning algorithm. We use first-order optimization 
algorithm called RPROP ETI . As outlined above, the training 
set T consists of (pi, Oj) pairs. The training algorithm is shown 
in Algorithm [3] 

Algorithm 3 Training Neural Network 

Require: Train set T, desired error e, max iterations M 

N Randomly InitializedNetworki) 

It<-0 

repeat 

It <- It + 1 

Aw <- 

TotalError <— 

for all (Input, DesiredOutput) G T do 

Output ^— Result (N, Input) 

Error 4— \DesiredOutput — Output\ 

Aw <— Aw + WeightUpdate(N, Error) 

TotalError TotalError + Error 
end for 

N <- Modify Weights (N, Aw) 
until TotalError < e or It > M 



4 We use the Manhattan metric d(x, y) = |a;, — ft]. 



5 A special case of the logistic function o(x) = (1 + e ( rx + k )^ 1, 
Parameters control the growth rate r and the x-position k. 
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F. Naive Bayes Classifier 

The Naive Bayes Classifier uses existing information to 
construct probability model of likelihoods of given feature 
variables based on a discrete-valued class variable. Using 
the Bayes equation, we can then estimate the probability 
distribution of class variable for particular values of the feature 
variables. 

In order to approximate the player's output vector O based 
on pattern vector P, we will compute each element of 
the output vector separately, covering the output domain by 
several fc-sized discrete intervals (classes). In fact, we use 
the PCA-represented input R (using the 10 most significant 
dimensions), since it better fits the pre-requisites of the Bayes 
classifier - values in each dimension are more independent and 
they approximate the normal distribution better. Additionally, 
small input dimensions are computationaly feasible. 

When training the classifier for O element Oj of class c = 
\pi/k\, we assume the R elements are normally distributed 
and feed the classifier information in the form 

R | c 

estimating the mean /i c and standard deviation <j c of each R 
element for each encountered c (see algorithm HJ. Then, we 
can query the built probability model on 

max P(c | R) 

c 

obtaining the most probable class i for an arbitrary R Each 
probability is obtained using the normal distribution formula: 



P(c | x) = 



1 



: exp 



-(x - ii c ) 2 
2a? 



Algorithm 4 Training Naive Bayes 



Require: Training set T = (R, c) 
for all (R, c) e T do 

RbyC c <- RbyC c U {R} 
end for 
for all c do 



\RbyC r \ ^ReRbyC, 



end for 

for all c do 

. i_ 

^~ \RbyC c 
end for 



RGRbyC, 



R 



R - He 



G. Implementation 

We have implemented the data mining methods as the 
"gostyle" open-source framework l22l . made available under 
the GNU GPL licence. The majority of our basic processing 
and analysis is implemented in the Python |23| programming 
language. 

We use several external libraries, most notably the MDP 
library l24l for the PCA analysis. The neural network com- 
ponent is written using the libfann C library ll25ll . The Naive 
Bayes Classifier is built around the AI : :NaiveBayesl Perl 
module 11261 . The sociomap has been visualised using the Team 
Profile Analyzer l27l which is a part of the Sociomap suite 



IV. Strength Analysis 

First, we have used our framework to analyse correlations 
of pattern vectors and playing strength. Like in other com- 
petitively played board games, Go players receive real-world 
rating number based on tournament games, and rank based on 
their rating. The amateur ranks range from 30-kyu (beginner) 
to 1-kyu (intermediate) and then follows 1-dan to 9-dan (top- 
level player). 

There are multiple independent real-world ranking scales 
(geographically based), while online servers also maintain 
their own user rank list. The difference between scales can 
be up to several ranks and the rank distributions also differ 

ma. 



A. Data source 

As the source game collection, we use the Go Teaching 
Ladder reviews archive ll30l . This collection contains 7700 
games of players with strength ranging from 30-kyu to 4-dan; 
we consider only even games with clear rank information. 

Since the rank information is provided by the users and may 
not be consistent, we are forced to take a simplified look at the 
ranks, discarding the differences between various systems and 
thus somewhat increasing error in our model@ We represent 
the rank in our dataset as an integer in the range [—3, 30] 
with positive numbers representing the kyu ranks and numbers 
smaller than 1 representing the dan ranks: 4-dan maps to —3, 
1-dan to 0, etc. 



B. Strength PCA analysis 

First, we have created a single pattern vector for each 
rank between 30-kyu to 4-dan; we have performed PCA 
analysis on the pattern vectors, achieving near-perfect rank 
correspondence in the first PCA dimension^ (figure |2). We 
measure the accuracy of the strength approximation by the 
first PCA dimension using Pearson's r (see IHI-A1 >. yielding 
very satisfying value of r = 0.979 implying extremely strong 
correlation. 

This reflects the trivial fact that the most important "defining 
characteristic" of a set of players grouped by strength is indeed 
their strength and confirms that our methodics is correct. At the 
same time, this result suggests that it is possible to accurately 
estimate player's strength just from a sample of his games, as 
we confirm below. 

When investigating a player's p, the PCA decomposition 
could be also useful for study suggestions — a program could 
examine the pattern gradient at the player's position on the 
PCA dimensions and suggest patterns to avoid and patterns to 
play more often. Of course, such an advice alone is certainly 
not enough and it must be used only as a basis of a more 
thorough analysis of reasons behind the fact that the player 
plays other patterns than they "should". 

6 Since our results seem satisfying, we did not pursue to try another 
collection; one could e.g. look at game archives of some Go server to work 
within single more-or-less consistent rank model. 

7 The eigenvalue of the second dimension was four times smaller, with no 
discernable structure revealed within the lower-order eigenvectors. 
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Fig. 2. PCA of by-strength vectors 
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C. Strength Classification 

In line with results of the PCA analysis, we have tested the 
strength approximation ability of fc-NN (sec. IHI-Dl l. neural 
network (sec. IIII-EI ). and a simple PCA-based classifier (sec. 

[nop). 

1) Reference (Training) Data: We have trained the tested 
classifiers using one pattern vector per rank (aggregate over all 
games played by some player declaring the given rank), then 
performing PCA analysis to reduce the dimension of pattern 
vectors. We have explored the influence of different game 
sample sizes (G) on the classification accuracy to determine 
the practicality and scaling abilities of the classifiers. In 
order to reduce the diversity of patterns (negatively impacting 
accuracy on small samples), we do not consider the contiguity 
pattern features. 

The classifiers were compared by running a many-fold 
validation by repeatedly and exhaustively taking disjunct G— 
game samples of the same rank from the collection and 
measuring the standard error of the classifier. Arbitrary game 
numbers were approximated by pattern file sizes, iteratively 
selecting all games of randomly selected player of the required 
strength. 

2) Results: The results are shown in the table [I] The G 
column describes the number of games in each sample, MSE 
column shows measured mean square error and a is the 
empirical standard deviation. Methods are compared (column 
Cmp) to the random classifier by the quotient of their a. 

From the table, it should be obvious that the fc-NN is 
obtaining good accuracy even on as few as 9 games as 
a sample, where the classifier performs within a standard 
deviation of 4.6kyu. For a large number of training vectors - 
albeit not very accurate due to small sample sizes - the neural 
network classifier performs very similarly. For samples of 2 
games, the neural network is even slightly better on average. 
However, due to the decreasing number of training vectors 
with increasing game sample sizes, the neural network gets 
unusable for large game sample sizes. The table therefore only 
shows the neural network results for samples of 17 games 
and smaller. PCA-based classifier (the most significant PCA 
eigenvector position is simply directly taken as a rank) and a 
random classifier are listed mainly for the sake of comparison, 
because they do not perform competetively. 



TABLE I 

Strength Classifier Performance 



Method 


G 


MSE 


a 


Cmp 


fc-NN 


85 


5.514 


2.348 


6.150 




43 


8.449 


2.907 


4.968 




17 


10.096 


3.177 


4.545 




9 


21.343 


4.620 


3.126 




2 


52.212 


7.226 


1.998 


Neural Network 


17 


110.633 


10.518 


1.373 




9 


44.512 


6.672 


2.164 




2 


43.682 


6.609 


2.185 


PCA 


85 


24.070 


4.906 


2.944 




43 


31.324 


5.597 


2.580 




17 


50.390 


7.099 


2.034 




9 


72.528 


8.516 


1.696 




2 


128.660 


11.343 


1.273 


Rnd 


N/A 


208.549 


14.441 


1.000 



3) k-NN parameters: Using the 4-Nearest Neighbors clas- 
sifier with the weight function 



Weight(x) = o,QM*Distance(S) 



(3) 



(parameter M ranging from 30 to 6). 

4) Neural network's parameters: The neural network clas- 
sifier had three-layered architecture (one hidden layer) com- 
prising of these numbers of neurons: 



Layer 


Input 


Hidden 


Output 


119 


35 


1 



The network was trained until the square error on the 
training set was smaller than 0.0005. Due to the small number 
of input vectors, this only took about 20 iterations of RPROP 
learning algorithm on average. 

V. Style Analysis 

As a second case study for our pattern analysis, we in- 
vestigate pattern vectors p of various well-known players, 
their relationships in-between and to prior knowledge in order 
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TABLE II 

Covariance Measure of Prior Information Dimensions 





T 


D 


a 


e 


year 


r 


1.000 


-0.438 


-0.581 


0.721 


0.108 


U) 




1.000 


0.682 


0.014 


-0.021 


a 






1.000 


-0.081 


0.030 


e 








1.000 


-0.073 


y- 










1.000 



to explore the correlation of prior knowledge with extracted 
patterns. We look for relationships between pattern vectors and 
perceived "playing style" and attempt to use our classifiers to 
transform the pattern vector p to a style vector s. 

A. Data sources 

1 ) Game database: The source game collection is GoGoD 
Winter 2008 PD containing 55000 professional games, dating 
from the early Go history 1500 years ago to the present. We 
consider only games of a small subset of players (table fTlTb . 
These players were chosen for being well-known within the 
players community, having large number of played games in 
our collection and not playing too long ago. 

2) Expert-based knowledge: In order to provide a refer- 
ence frame for our style analysis, we have gathered some 
information from game experts about various traditionally 
perceived style aspects to use as a prior knowledge. This 
expert-based knowledge allows us to predict styles of unknown 
players based on the similarity of their pattern vectors, as well 
as discover correlations between styles and particular move 
patterns. 

Experts were asked to mark four style aspects of each of 
the given players on the scale from 1 to 10. The style aspects 
are defined as shown: 



Style 


1 


10 


Territoriality r 


Moyo 


Territory 


Orthodoxity w 


Classic 


Novel 


Aggressivity a 


Calm 


Fighting 


Thickness 9 


Safe 


Shinogi 



We have devised these four style aspects based on our own 
Go experience and consultations with other experts. The used 
terminology has quite clear meaning to any experienced Go 
player and there is not too much room for confusion, except 
possibly in the case of "thickness" — but the concept is not 
easy to pin-point succintly and we also did not add extra 
comments on the style aspects to the questionnaire deliberately 
to accurately reflect any diversity in understanding of the 
terms. Averaging this expert based evaluation yields reference 
style vector s r (of dimension 4) for each player r from the 
set of reference players R. 

Throughout our research, we have experimentally found 
that playing era is also a major factor differentiating between 
patterns. Thus, we have further extended the s r by median 
year over all games played by the player. 



Three high-level Go players (Alexander Dinerstein 3 -pro, 
Motoki Noguchi 7-dan and Vft Brunner 4-dan) have judged 
the style of the reference players. The complete list of answers 
is in table [HI] Standard error of the answers is 0.952, making 
the data reasonably reliable, though much larger sample would 
of course be more desirable (but beyond our means to collect). 
We have also found a significant correlation between the 
various style aspects, as shown by the Pearson's r values in 
table mi 

We have made few manual adjustments in the dataset, 
disregarding some players or portions of their games. This 
was done to achieve better consistency of the games (e.g. 
considering only games of roughly the same age) and to 
consider only sets of games that can be reasonably rated as a 
whole by human experts (who can give a clear feedback in this 
effect). This filtering methodology can be easily reproduced 
and such arbitrary decisions are neccessary only for processing 
the training dataset, not for using it (either for exloration or 
classification). 

B. Style PCA analysis 

We have looked at the ten most significant dimensions of 
the pattern data yielded by the PCA analysis of the reference 
player set (fig. [3] shows the first three). We have again com- 
puted the Pearson's r for all combinations of PCA dimensions 
and dimensions of the prior knowledge style vectors to find 
correlations. 

It is immediately obvious both from the measured r and 
visual observation that by far the most significant vector 
corresponds very well to the territoriality of the players, 
confirming the intuitive notion that this aspect of style is the 
one easiest to pin-point and also most obvious in the played 
shapes and sequences (that can obviously aim directly at taking 
secure territory or building center-oriented framework). Thick 
(solid) play also plays a role, but these two style dimensions 
are already correlated in the prior data. 

The other PCA dimensions are somewhat harder to interpret, 
but there certainly is significant influence of the styles on the 
patterns; the correlations are all presented in table [IV] (Larger 
absolute value means better linear correspondence.) 

We also list the characteristic spatial patterns of the PCA 
dimension extremes (tables [VJ IVB . determined by their coef- 
ficients in the PCA projection matrix — however, such naive 
approach has limited reliability, better methods will have to be 
researched!! We do not show the other pattern features since 
they carry no useful information in the opening stage. 

The PCA results presented above do not show much corre- 
lation between the significant PCA dimensions and the lj and 
a style dimensions. However, when we applied the extended 
vector normalization (sec. lII-B2l see table I Vill i, some less sig- 
nificant PCA dimensions exhibited clear correlations^ While 
we do not use the extended normalization results elsewhere 

8 For example, as one of highly ranked "Takemiya's" PCA1 patterns, 3,3 
corner opening was generated, completely inappropriately; it reflects some 
weak ordering in bottom half of the dimension, not global ordering within 
the dimension. 

9 We have found that c = 6 in the post-processing logistic function produces 
the most instructive PCA output on our particular game collection. 
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TABLE III 

Expert-Based Style Aspects of Selected Professionals 1 2 



Player 


T 




a. 


e 


Go Seigen 3 


6.0 ±2.0 


9.0 ±1.0 


8.0 ±1.0 


5.0 ± 1.0 


Ishida Yoshio 4 


8.0 ±1.4 


5.0 ±1.4 


3.3 ± 1.2 


5.3 ± 0.5 


Miyazawa Goro 


1.5 ±0.5 


10 ±0 


9.5 ±0.5 


4.0 ± 1.0 


Yi Ch'ang-ho 5 


7.0 ±0.8 


5.0 ±1.4 


2.6 ±0.9 


2.6 ± 1.2 


Sakata Eio 


7.6 ±1.7 


4.6 ±0.5 


7.3 ±0.9 


8.0 ± 1.6 


Fujisawa Hideyuki 


3.5 ±0.5 


9.0 ± 1.0 


7.0 ±0.0 


4.0 ±0.0 


Otake Hideo 


4.3 ±0.5 


3.0 ±0.0 


4.6 ± 1.2 


3.6 ± 0.9 


Kato Masao 


2.5 ±0.5 


4.5 ± 1.5 


9.5 ±0.5 


4.0 ±0.0 


Takemiya Masaki 4 


1.3 ±0.5 


6.3 ±2.1 


7.0 ±0.8 


1.3 ± 0.5 


Kobayashi Koichi 


9.0 ±1.0 


2.5 ± 0.5 


2.5 ±0.5 


5.5 ± 0.5 


Cho Chikun 


9.0 ±0.8 


7.6 ±0.9 


6.6 ± 1.2 


9.0 ±0.8 


Ma Xiaochun 


8.0 ±2.2 


6.3 ±0.5 


5.6 ± 1.9 


8.0 ±0.8 


Yoda Norimoto 


6.3 ±1.7 


4.3 ±2.1 


4.3 ±2.1 


3.3 ± 1.2 


Luo Xihe 


7.3 ±0.9 


7.3 ±2.5 


7.6 ±0.9 


6.0 ± 1.4 


O Meien 


2.6 ±1.2 


9.6 ±0.5 


8.3 ±1.7 


3.6 ± 1.2 


Rui Naiwei 


4.6 ±1.2 


5.6 ±0.5 


9.0 ±0.8 


3.3 ± 1.2 


Yuki Satoshi 


3.0 ±1.0 


8.5 ± 0.5 


9.0 ± 1.0 


4.5 ± 0.5 


Hane Naoki 


7.5 ±0.5 


2.5 ± 0.5 


4.0 ±0.0 


4.5 ± 1.5 


Takao Shinji 


5.0 ± 1.0 


3.5 ± 0.5 


5.5 ± 1.5 


4.5 ± 0.5 


Yi Se-tol 


5.3 ±0.5 


6.6 ±2.5 


9.3 ±0.5 


6.6 ± 1.2 


Yamashita Keigo 4 


2.0 ±0.0 


9.0 ±1.0 


9.5 ±0.5 


3.0 ± 1.0 


Cho U 


7.3 ±2.4 


6.0 ±0.8 


5.3 ± 1.7 


6.3 ± 1.7 


Gu Li 


5.6 ±0.9 


7.0 ±0.8 


9.0 ±0.8 


4.0 ±0.8 


Chen Yaoye 


6.0 ±1.0 


4.0 ± 1.0 


6.0 ± 1.0 


5.5 ± 0.5 



1 Including standard deviation. Only players where we received at least two 
out of three answers are included. 

2 Since the playing era column does not fit into the table, we at least sort the 
players ascending by their median year. 

3 We do not consider games of Go Seigen due to him playing across 
several distinct eras and also being famous for radical opening experiments 
throughout the time, and thus featuring especially high diversity in patterns. 

4 We do not consider games of Ishida Yoshio and Yamashita Keigo for 
the PCA analysis since they are significant outliers, making high-order 
dimensions much like purely "similarity to this player". Takemiya Masaki 
has the similar effect for the first dimension, but that case corresponds to 
common knowledge of him being an extreme proponent of anti-territorial 
("moyo") style. 

5 We consider games only up to year 2004, since Yi Ch'ang-ho was prominent 
representative of a balanced, careful player until then and still has this 
reputation in minds of many players, but is regarded to have altered his 
style significantly afterwards. 



TABLE IV 

Covariance Measure of PCA and Prior Information 



Eigenval. 


T 




a 


8 


Year 


0.447 


-0.530 


0.323 


0.298 


-0.554 


0.090 


0.194 


-0.547 


0.215 


0.249 


-0.293 


0.630 


0.046 


0.131 


-0.002 


-0.128 


0.242 


0.630 


0.028 


-0.011 


0.225 


0.186 


0.131 


0.067 


0.024 


-0.181 


0.174 


-0.032 


-0.216 


0.352 



since they produced noticeably less accurate classifiers in all 
dimensions (including u) and a), it is instructive to look at the 
PCA dimensions. 

In contrast with the emphasis of opening patterns in the r 
and 9 dimensions, the most contributing patterns of the w and 



0.5 



-0.5 



Takemiya Masaki Yuki Satoshi 





' Fujisawa Hideyuki 




Takemiya Masaki 




Sakata Eio 




Otake Hideo 




Kato Masao 


Miyazawa Goro 




Ri li Mpiwpi 


Cho U 


Yi Ch'ang-ho 




O Meien 






Miyazawa Goro 


Luo Xihe 


O Meien 


Yi Se-tol 


Rui Naiwei 


Takao Shinji 




Kobayashi Koichi 


Hane Naoki 


Yoda Norimoto 




Kato Masao 
Yuki Satoshi 
Fujisawa Hideyuki 
Cho Chikun 
Hane Naoki 
Gu Li 

Ma Xiaochun 
Chen Yaoye 


Kobayashi Koichi 
Yi Ch'ang-ho 
Gu Li 

Cho Chikun 
Takao Shinji 
Luo Xihe 
Yoda Norimoto 
Yi Se-tol 


Cho U 




Otake Hideo 


Chen Yaoye 


Sakata Eio 


i Ma Xiaochun 



Sakata Eio 



Cho Chikun 
Yoda Norimoto 

Ma Xiaochun 

Kobayashi Koichi 

Takemiya Masaki 
Miyazawa Goro 

Kato Masao 
O Meien 

Otake Hideo 

Takao Shinji 
Fujisawa Hideyuki 

Luo Xihe 
Gu Li 

Yi Ch'ang-ho 

Yi Se-tol 
Chen Yaoye 

Hane Naoki 
Rui Naiwei 
Yuki Satoshi 



0.45 



0.19 

PCA dim. weights 



Cho U 
0.046 



Fig. 3. Columns with the most significant PCA dimensions of the dataset. 
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TABLE V 

Characteristic Patterns of PCAi 2 Dimensions 1 



PCAi — Moyo-oriented, thin-playing player 

































-< 
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0.274 

high corner/side 
opening 2 



0.086 

high comer 
approach 



0.083 
high distant pincer 



PCAi — Territorial, thick-playing player 



















-< 


> 





















-0.399 

low corner 
approach 



-0.399 -0.177 
low corner reply low corner enclosure 



PCA2 — Territorial, current player 



a 



































t 












-< 


> 






























< 

























-0.193 
low corner reply 4 



-0.139 

high distant 
approach/pincer 



-0.135 
near low pincer 



1 We present the patterns in a simplified compact form; in reality, they 
are usually somewhat larger and always circle-shaped (centered on the 
triangled move). We omit only pattern segments that are entirely empty. 

2 We give some textual interpretation of the patterns, especially since some 
of them may not be obvious unless seen in game context; we choose the 
descriptions based on the most frequently observer contexts, but of course 
the pattern can be also matched in other positions and situations. 

3 In the second PCA dimension, we find no correlated patterns; only 
uncorrelated and anti-correlated ones. 

4 As the second most significant pattern, we skip a slide follow-up pattern 
to this move. 



a dimensions are the middle-game patterns that occur less 
frequently and require the extended normalization not to be 
over-shadowed by the opening patterns^ E.g. the most char- 
acteristic patterns on the aggressiveness dimension represent 
moves that make life with small, unstable groups (connecting 
kosumi on second line or mouth-shape eyespace move), while 
the novel-ranked players seem to like the (in)famous tsuke- 

10 In the middle game, basic areas of influence have been staked out and 
invasions and group attacks are being played out. Notably, the board is much 
more filled than in the opening and thus particular specific-shape patterns 
repeat less often. 



TABLE VI 

Characteristic Patterns of PCA3 Dimension 1 



PCA3 — Old-time player 





































< 






0. 


511 









r 













low corner approach 



0.264 

low side or 
mokuhazushi 
opening 



0.258 
san-san opening 



PCA3 — Current player 



it 



-m- 









> — 




















-0.276 



-0.273 



-0.116 



low corner enclosure 3-4 corner opening high approach reply 



1 We cannot use terms "classic" and "modern" in case of PCA3 since the 
current patterns are commonplace in games of past centuries (not included 
in our training set) and many would call a lot of the old-time patterns 
modern inventions. Perhaps we can infer that the latest 21th-century play 
trends abandon many of the 20th-century experiments (lower echelon of 
our by-year samples) to return to the more ordinary but effective classic 
patterns. 

2 At this point, we skip two patterns already shown elsewhere: high 
side/corner opening and low corner reply. 

TABLE VII 

Covariance Measure of Externed-Normalization PCA 
and Prior Information 



Eigenval. 


T 




a 


e 


Year 


6.377 


0.436 


-0.220 


-0.289 


0.404 


-0.576 


1.727 


0.690 


0.340 


0.315 


-0.445 


0.639 


1.175 


-0.185 


0.156 


0.107 


-0.315 


0.320 


0.845 


0.064 


-0.102 


-0.189 


0.032 


0.182 


0.804 


-0.185 


0.261 


0.620 


0.120 


0.056 


0.668 


-0.027 


0.055 


0.147 


-0.198 


0.155 


0.579 


0.079 


0.509 


0.167 


0.294 


-0.019 



nobi joseki sequence^ This may either mean that novel 
players like to play the joseki more, or (more likely, in 
our opinion) that novel players are more likely to get into 
unorthodox situation that require resorting to the tsuke-nobi 
sequence. We believe that the next step in interpreting our 
analytical results will be more refined prior information input 
and precise analysis of the outputs by Go experts. 

Fig. [4] shows the Sociomap visualisation as an alternate 
view of the player relationships and similarity, as well as 
correlation between the expert-given style marks and the PCA 
decomposition. The four-dimensional style vectors are used 

1 1 Tsuke-nobi is a well-known joseki popular among beginners, but profes- 
sionals usually play it only in special contexts. 
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Fig. 4. Sociomap visualisation. The spatial positioning of players is based 
on the expert knowledge, while the node heights (depicted by contour lines) 
represent the pattern vectors. 



as input for the Sociomap Tenderer and determine the spatial 
positions of players. The height of a node is then determined 
using first two PCA dimensions R\ , R2 and their eigenvalues 
Ai, A2 as their linear combination: 

h = X\Ri + A2-R2 

We can observe that the terrain of the sociomap is rea- 
sonably "smooth", again demonstrating some level of connec- 
tion between the style vectors and data-mined information. 
High countour density indicates some discrepancy; in case of 
Takemiya Masaki and Yi Ch'ang-ho, this seems to be merely 
an issue of scale, while the Rui Naiwei — Gu Li cliff suggests 
a genuine problem; we cannot say now whether it is because of 
imprecise prior information or lacking approximation abilities 
of our model. 

C. Style Classification 

Similarly to the the Strength classification (section IIV-CI ). 
we have tested the style inference ability of fc-NN (sec. HII-Dl ). 
neural network (sec. IIII-EI ). and Bayes (sec. IIII-FI ) classifers. 

1) Reference (Training) Data: As the reference data, we 
use expert-based knowledge presented in section IV-A2I For 
each reference player, that gives 4-dimensional style vector 
(each component in the range of [1, 10])F^I 

All input (pattern) vectors were preprocessed using PCA, 
reducing the input dimension from 400 to 23. We measure the 
performance on the same reference data using 5-fold cross 
validation. To put our measurements in scale, we also include 
a random classifier in our results. 

12 Since the neural network has activation function with range [—1, 1], we 
have linearly rescaled the style vectors from interval [1, 10] to [—1,1] before 
using the training data. The network's output was afterwards rescaled back to 
allow for MSE comparison. 
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TABLE VIII 
Comparison of style classifiers 





MSE 




Classifier 


T 




a 


e 


Mean 


Cmp 


Joint classifier 1 


4.04 


5.25 


3.52 


3.05 


3.960 


2.97 


Neural network 


4.03 


6.15 


3.58 


3.79 


4.388 


2.68 


fc-NN (fc = 2) 


4.08 


5.40 


4.77 


3.37 


4.405 


2.67 


fc-NN (fc = 3) 


4.05 


5.58 


5.06 


3.41 


4.524 


2.60 


fc-NN (fc = 1) 


4.52 


5.26 


5.36 


3.09 


4.553 


2.59 


fc-NN (fc = 4) 


4.10 


5.88 


5.16 


3.60 


4.684 


2.51 


Naive Bayes 


4.48 


6.90 


5.48 


3.70 


5.143 


2.29 


Random class. 


12.26 


12.33 


12.40 


10.11 


11.776 


1.00 



Note that these measurements have a certain variance. Since the Joint 
classifier performance was measured from scratch, the precise values 
may not match appropriate cells of the used methods. 



2) Results: The results are shown in the table I Villi Second 
to fifth columns in the table represent mean square error 
(MSE) of the examined styles, Mean is the mean square error 
across the styles and finally, the last column Cmp represents 
Mean(Randomclassifier) / Mean(X) - comparison of mean 
square error of each method with the random classifier. To 
minimize the effect of random variables, all numbers were 
taken as an average of 200 runs of the cross validation. 

Analysis of the performance of fc-NN classifier for different 
fc-values has shown that different fc-values are suitable to 
approximate different styles. Combining the fc-NN classifiers 
with the neural network (so that each style is approximated 
by the method with lowest MSE in that style) results in Joint 
classifier, which outperforms all other methods. The Joint 
classifier has outstanding MSE 3.960, which is equivalent to 
standard error of er = 1.99 per style F^l 

3) k-NN parameters: All three variants of fc-NN classifier 
(fc = 2,3,4) had the weight function 

Weight(x) = Q_s w * mstance{S) (4) 

The parameters were chosen empirically to minimize the MSE. 

4) Neural network's parameters: The neural network clas- 
sifier had three-layered architecture (one hidden layer) com- 
prising of these numbers of neurons: 



Layer 


Input 


Hidden 


Output 


23 


30 


4 



The network was trained until the square error on the 
training set was smaller than 0.0003. Due to a small number of 
input vectors, this only took 20 iterations of RPROP learning 
algorithm on average. 

5) Naive Bayes parameters: We have chosen fc = 10/7 
as our discretization parameter; ideally, we would use fc = 1 
to fully cover the style marks domain, however our training 
sample turns out to be too small for that. 

13 We should note that the pattern vector for each tested player was 
generated over at least few tens of games. 
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VI. Proposed Applications 

We believe that our findings might be useful for many 
applications in the area of Go support software as well as Go- 
playing computer engines. However, our foremost aim is to use 
the style analysis as an excellent teaching aid — classifying 
style dimensions based on player's pattern vector, many study 
recommendations can be given, e.g. about the professional 
games to replay, the goal being balancing understanding of 
various styles to achieve well-rounded skill set0 A user- 
friendly tool based on our work is currently in development. 

Another promising application we envision is helping to 
determine the initial real-world rating of a player before their 
first tournament based on a sample of their games played 
on the internet; some players especially in less populated 
areas could get fairly strong before playing in their first real 
tournament. Similarly, a computer Go program can quickly 
classify the level of its human opponent based on the pattern 
vector from their previous games and auto-adjust its difficulty 
settings accordingly to provide more even games for beginners. 
This can also be achieved using win-loss statistics, but pattern 
vector analysis should converge faster initially, providing a 
much better user experience. 

We hope that more strong players will look into the style 
dimensions found by our statistical analysis — analysis of 
most played patterns of prospective opponents might prepare 
for a tournament game, but we especially hope that new 
insights on strategic purposes of various shapes and general 
human understanding of the game might be improved by 
investigating the style-specific patterns. 

Of course, it is challenging to predict all possible uses of our 
work by others. Some less obvious applications might include 
analysis of pattern vectors extracted from games of Go-playing 
programs: the strength and style classification might help to 
highlight some weaknesses and room for improvements 
Also, new historical game records are still occassionally being 
discovered; pattern-based classification might help to suggest 
or verify origin of the games in Go Archeology. 

VII. Future Research 

Since we are not aware of any previous research on this 
topic and we are limited by space and time constraints, plenty 
of research remains to be done in all parts of our analysis 
— we have already noted many in the text above. Most 
significantly, different methods of generating and normalizing 
the p vectors can be explored and other data mining methods 
could be investigated. Better ways of visualising the rela- 
tionships would be desirable, together with thorough expert 
dissemination of internal structure of the player pattern vectors 
space: more professional players should be consulted on the 
findings and for style scales calibration. 

14 The strength analysis could be also used in a similar fashion, but the 
lesson learned cannot simply be "play pattern X more often"; instead, the 
insight lays in the underlying reason of disproportionate frequency of some 
patterns. 

15 Of course, correlation does not imply causation and we certainly do not 
suggest simply optimizing Go-playing programs according to these vectors. 
However, they could hint on general shortcomings of the playing engines if 
the actual cause of e.g. surprisingly low strength prediction is investigated. 



It can be argued that many players adjust their style by 
game conditions (Go development era, handicap, komi and 
color, time limits, opponent) or that styles might express 
differently in various game stages. These factors should be 
explored by building pattern vectors more carefully than by 
simply considering all moves in all games of a player. Impact 
of handicap and uneven games on by-strength p distribution 
should be also investigated. 

VIII. Conclusion 

We have proposed a way to extract summary pattern in- 
formation from game collections and combined this with 
various data mining methods to show correspondence of our 
pattern summaries with various player meta-information such 
as playing strength, era of play or playing style, as ranked by 
expert players. We have implemented and measured our pro- 
posals in two case studies: per-rank characteristics of amateur 
players and per-player style/era characteristics of well-known 
professionals. 

While many details remain to be worked out, we have 
demonstrated that many significant correlations doubtlessly 
do exist and it is practically viable to infer the player meta- 
information from extracted pattern summaries and we have 
proposed applications for such inference. Finally, we outlined 
some of the many possible directions of future work in this 
newly staked research field on the boundary of Computer Go, 
Data Mining and Go Theory. 
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